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(57) Abstract: A method operable in a 
digital image acquisition system having 
no photographic film is provided. The 
method comprises receiving a relatively 
low resolution image of a scene from 
an image stream, wherein the scene 
potentially includes one or more faces. 
At least one high quality face classifier is 
applied to the image to identify relatively 
large and medium sized face regions 
and at least one relaxed face classifier is 
applied to the image to identify relatively 
small sized face regions. A relatively high 
resolution image of nominally the same 
scene is received and at least one high 
quality face classifier is applied to the 
identified small sized face regions in the 
higher resolution version of said image. 
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Face Tracking in a Camera Processor 



FIELD OF THE INVENTION 

The present invention provides an improved method and 
5 apparatus for image processing in acquisition devices. In 

particular, the invention provides improved face tracking in 
a digital image acquisition device, such as a camera phone. 



BACKGROUND 

10 Figure 1 illustrates digital image acquisition 

apparatus, for example a camera phone. The apparatus 10 
comprises an Image Signal Processor, ISP, 14, which is in 
general, a general purpose CPU with relatively limited 
processing power. Typically, the ISP 14 is a dedicated chip 

15 or chip-set with a sensor interface 20 having dedicated 

hardware units that facilitate image processing including 
image pipeline 22. Images acquired by an imaging sensor 16 
are provided to the ISP 14 through the sensor interface 20. 
The apparatus further comprises a relatively powerful 

20 host processor 12, for example, an ARM9, which is arranged to 
receive an image stream from the ISP 14. 

The apparatus 10 is equipped with a display 18, such as 
an LCD, for displaying preview images, as well as any main 
image acquired by the apparatus. Preview images are generated 

25 automatically once the apparatus is switched on or only in a 
pre-capture mode in response to half pressing a shutter 
button. A main image is typically acquired by fully 
depressing the shutter button. 

Conventionally, high level image processing, such as 

30 face tracking, is run on the host processor 12 which provides 
feedback to the pipeline 22 of the ISP 14. The ISP 14 then 
renders, adjusts and processes subsequent image (s) in the 
image stream based on the feedback provided by the host 
processor 12, typically through an I2C interface 24. Thus, 
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acquisition parameters of the subsequent image in the stream 
may be adjusted such that the image displayed to the user is 
enhanced. 

Such acquisition parameters include focus, exposure and 
5 white balance. 

Focus determines distinctness or clarity of an image or 
relevant portion of an image and is dependent on a focal 
length of a lens and a capture area of the imaging sensor 16. 
Methods of determining whether an image is in-focus are well 

10 known in the art. For example, if a face region is detected 

in an image, then given that most faces are approximately the 
same size and the size of the face within an acquired image, 
an appropriate focal length can be chosen for a subsequent 
image to ensure the face will appear in focus in the image. 

15 Other methods can be based on the overall level of sharpness 
of an image or portion of an image, for example, as indicated 
by the values of high frequency DCT coefficients in the 
image. When these are highest in the image or a region of 
interest, say a face region, the image can be assumed to be 

20 in-focus. Thus, by adjusting the focal length of the lens to 
maximize sharpness, the focus of an image may be enhanced. 

Exposure of an image relates to an amount of light 
falling on the imaging sensor 16 during acquisition of an 
image. Thus an under-exposed image appears quite dark and has 

25 an overall low luminance level, whereas an overexposed image 
appears quite bright and has an overall high luminance level. 
Shutter speed and lens aperture affect the exposure of an 
image and can therefore be adjusted to improve image quality 
and the processing of an image. For example, it is well known 

30 that face detection and recognition are sensitive to over or 
under exposure of an image and so exposure can be adjusted to 
optimize the detection of faces within an image stream. 

Due to the fact that most light sources are not 100% 
pure white, objects illuminated by a light source will be 
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subjected to a colour cast. For example, a halogen light 
source illuminating a white object will cause the object to 
appear yellow. In order for a digital image acquisition 
apparatus to compensate for the colour cast, i.e. perform 
5 white balance, it requires a white reference point. Thus, by 
identifying a point in an image that should be white, for 
example the sclera of an eye, all other colours in the image 
may be compensated accordingly. This compensation information 
may then be utilised to determine the type of illumination 

10 under which an image should be acquired. 

While adjusting acquisition parameters such as those 
described above is useful and can improve image quality and 
processing, the feedback loop to the ISP 14 is relatively 
slow, thereby causing delays in providing the ISP 14 with the 

15 relevant information to rectify the focus, exposure and white 
balance of an image. This can mean that in a fast changing 
scene, adjustment indications provided by the host processor 
12 may be inappropriate when they are made by the ISP 14 to 
subsequent images of the stream. Furthermore, typically most 

20 of the processing power available to the host processor 12 is 
required to. run the face tracker application, leaving minimal 
processing power available for carrying out value added 
processing . 

It is an object of the present invention to mitigate the 
25 disadvantages associated with the prior art and to provide an 
improved method of face tracking in a digital image 
acquisition device . 

DESCRIPTION OF THE INVENTION 
30 The present invention provides a method operable in a 

digital image acquisition system as claimed in claim 1. 
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Embodiments of the invention will now be described, by- 
way of example, with reference to the accompanying drawings, 
in which: 

Figure 1 is a block diagram of a digital image 
5 acquisition apparatus in which a preferred embodiment of the 
present invention may be implemented; and 

Figure 2 is a workflow illustrating a preferred 
embodiment of the present invention. 



10 DESCRIPTION OF PREFERRED EMBODIMENTS 

Face tracking for digital image acquisition devices 
include methods of marking human faces in a series of images 
such as a video stream or a camera preview. Face tracking 
can be used to indicate to a photographer, locations of faces 

15 in an image or to allow post processing of the images based 
on knowledge of the locations of the faces. Also, face 
tracker applications can be used in adaptive adjustment of 
acquisition parameters of an image, such as, focus, exposure 
and white balance, based on face information in order to 

20 produce improved the quality of acquired images. 

In general, face tracking systems employ two principle 
modules: (i) a detection module for locating new candidate 
face regions in an acquired image or a sequence of images; 
and (ii) a tracking module for confirming face regions. 

25 A well-known method of fast-face detection is disclosed 

in US 2002/0102024, hereinafter Viola-Jones. In Viola-Jones, 
a chain (cascade) of 32 classifiers based on rectangular (and 
increasingly refined) Haar features are used with an integral 
image, derived from an acquired image, by applying the 

30 classifiers to a sub-window within the integral image. For a 
complete analysis of an acquired image, this sub-window is 
shifted incrementally across the integral image until the 
entire image has been covered. 
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In addition to moving the sub-window across the entire 
integral image, the sub window is also scaled up/down to 
cover the possible range of face sizes. It will therefore be 
seen that the resolution of the integral image is determined 
5 by the smallest sized classifier sub-window, i.e. the 
smallest size face to be detected, as larger sized sub- 
windows can use intermediate points within the integral image 
for their calculations. 

A number of variants of the original Viola-Jones 
10 algorithm are known in the literature, such as disclosed in 
International Patent Application No. PCT/EP2007/005330 
(FN143) . 

In the present embodiment, a face tracking process runs 
on the ISP 14 as opposed to the host processor 12. Thus, more 

15 processing power of the host processor is available for 

further value added applications, such as face recognition. 
Furthermore, parameters of an acquired image, such as focus, 
exposure and white balance, can be adaptively adjusted more 
efficiently by the ISP 14. 

20 As will be appreciated, face tracking applications 

carried out on high resolution images will generally achieve 
more accurate results than on relatively lower resolution 
images. Furthermore, tracking relatively small size faces 
within an image generally requires proportionally more 

25 processing than for larger faces. 

The processing power of the ISP 14 is of course limited, 
and so the arrangement of face tracking application according 
to the present invention is optimized to run efficiently on 
the ISP 14. 

30 In the preferred embodiment, a typical input frame 

resolution is 160 by 120, and face sizes are categorised as 
small, medium or large. Medium sized and large sized faces in 
an image are detected by applying 14x14 and 22x22 high 
quality classifiers respectively, e.g. relatively long 
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cascade classifiers or classifiers with a relatively high 
threshold for accepting a face. 

The distance of a subject face from the acquisition 
apparatus determines a size of the subject face in an image. 
5 Clearly, a first subject face located at a greater distance 
from the acquisition device than a second subject face will 
appear smaller. Smaller sized faces comprise fewer pixels and 
thus less information may be derived from the face. As such, 
detection of smaller sized faces is inherently less reliable 

10 even given the proportionally more processing required than 
for larger faces. 

In the preferred embodiment, small sized faces are 
detected with a relaxed 7x7 classifier, e.g. a short-cascade 
classifier or classifier with a lower threshold for accepting 

15 a face. Using a more relaxed classifier reduces the 

processing power which would otherwise be required to detect 
small sized faces. 

Nonetheless, it is appreciated that the application of 
such a relaxed classifier results in a larger number of false 

20 positives, i.e. non-face regions being classified as faces. 
As such, the adjustment of image acquisition parameters is 
applied differently in response to detection of small faces 
and the further processing of images is different for small 
faces than medium or large faces as explained below in more 

25 detail. 

Figure 2 shows a workflow illustrating a preferred 
embodiment of the present invention. 

On activation, the apparatus 10 automatically captures 
and stores a series of images at close intervals so that 
30 sequential images are nominally of the same scene. Such a 
series of images may include a series of preview images, 
post-view images, or a main acquired image. 
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In preview mode, the imaging sensor 16 provides the ISP 
14 with a low resolution image e.g. 160 by 120 from an image 
stream, step 100. 

The ISP 14 applies at least one high quality classifier 
5 cascade to the image to detect large and medium sized faces, 
step 110. Preferably, both 14x14 and 22x22 face classifier 
cascades are applied to the image. 

The ISP 14 also applies at least one relaxed face 
classifier to the image to detect small faces, step 120. 
10 Preferably, a 7x7 face classifier is applied to the image. 

Based on knowledge of the faces retrieved from the 
classifiers, image acquisition parameters for a subsequent 
image in the stream may be adjusted to enhance the image 
provided to the display 18 and/or to improve processing of 
15 the image. In the preferred embodiment, knowledge of the 

faces retrieved from the classifiers is utilised to adjust 
one or more of focus, exposure and/or white balance of a next 
image in the image stream, step 130. 

Knowledge of the faces received from the classifiers 
20 comprises information relating to the location of the faces, 
the size of the faces and the probability of the identified 
face actually being a face. International Patent Application 
No. PCT/EP2007/00654 0 (FN182/FN232/FN214 ) discusses 
determining a confidence level indicating the probability of 
25 a face existing at the given location. This information may 

be utilised to determine a weighting for each face to thereby 
facilitate the adjustment of the acquisition parameters. 

In general, a large face will comprise more information 
than a relatively smaller face. However, if the larger face 
30 has a greater probability of being falsely identified as a 
face, and/or is positioned at non-central position of the 
image, it could be allocated a lower weighting even than that 
of a relatively smaller face, positioned at a centre of the 
image and comprising a lower probability of being a false 
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positive. Thus, the information derived from the smaller face 
could be used to adjust the acquisition parameters in 
preference to the information derived from the large face. 
In the embodiment, where only small sized faces are 
5 detected in the image, knowledge of the small faces is 

utilised only to adjust exposure of the next image in the 
stream. It will be appreciated that although the relaxed 
classifier passes some false positives, these do not severely 
adversely influence the adjustment of the exposure. 

10 Focus adjustment is not performed on the next image 

based on small faces, due to the fact that a lens of the 
apparatus will be focused at infinity for small faces and 
there is little to be gained from such adjustment. White 
balance is not adjusted for small faces because they are 

15 considered too small to retrieve any significant white 

balance information. Nonetheless, each of focus and white 
balance can be usefully adjusted based on detection of medium 
and large sized faces. 

In the preferred embodiment, once a user acquires a 

20 full-sized main image, e.g. by clicking the shutter, and this 
is communicated to the host, step 150, the detected/tracked 
face regions are also communicated to the host processor 12, 
step 140. 

In alternative embodiments full-sized images may be 
25 acquired occasionally without user intervention either at 

regular intervals (e.g. every 30 preview frames, or every 3 
seconds) , or responsive to an analysis of the preview image 
stream - for example where only smaller faces are detected it 
may be desirable to occasionally re-confirm the information 
30 deduced from such images. 

After acquisition of a full-sized main image the host 
processor 12 retests the face regions identified by the 
relaxed small face classifier on the larger (higher 
resolution) main image, typically having a resolution of 
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320x240, or 640x480, with a high quality classifier, step 
160. This verification mitigates or eliminates false 
positives passed by the relaxed face classifier on the lower 
resolution image. Since the retesting phase is carried out on 
5 a higher resolution version of the image, the small sized 

faces comprise more information and are thereby detectable by 
larger window size classifiers. In this embodiment, both 
14x14 and 22x22 face classifiers are employed for 
verification . 

10 Based on the verification, the main image can be 

adjusted for example, by adjusting the luminance values of 
the image to more properly illuminate a face or by adjusting 
the white balance of the image. Other corrections such as 
red-eye correction or blur correction are also improved with 

15 improved face detection. 

In any case, the user is then presented with a refined 
image on the display 18, enhancing the user experience, step 
170. 

The verification phase requires minimal computation, 
20 allowing the processing power of the host processor 12 to be 
utilised for further value added applications, for example, 
face recognition applications, real time blink detection and 
prevention, smile detection, and special real time face 
effects such as morphing. 
25 In the preferred embodiment, a list of verified face 

locations is provided back to the ISP 14, indicated by the 
dashed line, and this information can be utilised to improve 
face tracking or image acquisition parameters within the ISP 
14 . 

30 In an alternative embodiment, the verification phase 

can be carried out on the ISP 14 as although verification is 
carried out on a higher resolution image, the classifiers 
need not be applied to the whole image, and as such little 
processing power is required. 
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The invention is not limited to the embodiment ( s ) 
described herein but can be amended or modified without 
departing from the scope of the present invention. 



WO 2009/039876 
Claims : 



11 



PCT/EP2007/009763 



1. A method operable in a digital image acquisition system 
having no photographic film, said method comprising: 

5 a) receiving a relatively low resolution image of a 

scene from an image stream, said scene potentially including 
one or more faces; 

b) applying at least one high quality face classifier to 
said image to identify relatively large and medium sized face 

10 regions; 

c) applying at least one relaxed face classifier to said 
image to identify relatively small sized face regions; 

d) receiving a relatively high resolution image of 
nominally the same scene; and 

15 e) applying at least one high quality face classifier to 

said identified small sized face regions in said higher 
resolution version of said image . 

2. The method according to claim 1 comprising performing 
20 steps a) to c) on a first processor and performing steps d) 

and e) on a separate second processor. 

3. The method of claim 1 wherein each of steps b) and c) 
include providing information including face size, face 

25 location, and an indication of a probability of said image 
including a face at or in the vicinity of said face region. 

4. The method of claim 3 further comprising generating a 
weighting based on said information. 

30 

5. The method according to claim 3 comprising adjusting 
image acquisition parameters of a subsequent image in said 
image stream based on said information. 
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6. The method according to claim 5 wherein said adjusted 
image acquisition parameters include at least one of focus, 
exposure and white balance. 

5 7 . The method according to claim 5 wherein said subsequent 

image in said stream is a preview image. 

8. The method according to claim 5 wherein said subsequent 
image in said stream is a main acquired image. 

10 

9. The method according to claim 5 further comprising 
displaying said subsequent image to a user. 

10. The method according to claim 2 comprising: 

15 further comprising performing value added applications on 

said high resolution image on said separate second processor. 

11. The method according to claim 1 in which said at least 
one high quality face classifier comprises a relatively long 

20 cascade classifier or a classifier with a relatively high 
threshold for accepting a face and wherein said relaxed 
classifier comprises a relatively short cascade classifier or 
a classifier with a relatively low threshold for accepting a 
face . 

25 

12. A digital image acquisition apparatus comprising, a 
first processor operably connected to an imaging sensor, and 
a second processor operably connected to said first 
processor, said first processor being arranged to provide an 

30 acquired image to said second processor and said second 
processor being arranged to store said image, 

said first processor being arranged to apply at least 
one high quality face classifier to a relatively low 
resolution image of a scene from an image stream, said scene 
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potentially including one or more faces, to identify 
relatively large and medium sized face regions, and to apply 
at least one relaxed face classifier to said image to 
identify relatively small sized face regions; and 
5 said second processor being arranged to receive a 

relatively high resolution image of nominally the same scene 
and to apply at least one high quality face classifier to 
said identified small sized face regions in said higher 
resolution version of said image. 
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